我们提出了一个多能结构的算法框架,该结构从简单的紧凑结构演变为各种复杂的3-D结构,以设计形状可转换,可重新配置和可部署的结构和机器人。我们的算法方法提出了一种将由统一的构件组成的紧凑结构转换为大型,所需的3D形状的方法。类似于可以根据编码的信息成长为预编程形状的多能干细胞,我们称之为DNA,称为合子结构的紧凑型面板可以通过对其连接路径进行编程,可以演变成任意的3D结构。我们的堆叠算法通过将所需结构的体素化表面成反比,从而获得了这一编码序列。应用堆叠算法获得的连接路径,可以将指定的Zygote结构的紧凑型堆叠面板部署到各种大型3D结构中。我们在概念上通过分别释放商业弹簧铰链和热驱动的形状合金(SMA)铰链来证明我们的多能发展结构。我们还表明,所提出的概念可以在较小的工作区中制造大型结构。
translated by 谷歌翻译
与手语识别(SLR)相比,手语翻译(SLT)是一项尚未相对较多研究的任务。但是,SLR是一项认识到手语的独特语法的研究,该语言与口语不同,并且存在一个非障碍者无法轻易解释的问题。因此,我们将解决在手语视频中直接翻译口语的问题。为此,我们提出了一种基于签名者的骨架点执行翻译的新关键标准化方法,并在手语翻译中稳健地将这些点标准化。根据身体部位的不同,它通过定制的标准化方法有助于提高性能。此外,我们提出了一种随机框架选择方法,该方法可以同时实现框架增强和采样。最后,通过基于注意力的翻译模型将其转化为口语。我们的方法可以应用于可以无光泽的数据集应用于数据集的各种数据集。此外,定量实验评估证明了我们方法的卓越性。
translated by 谷歌翻译
虽然生成模型的最新进步为社会带来了不同的优势,但它也可以滥用恶意目的,例如欺诈,诽谤和假新闻。为了防止这种情况,进行了剧烈的研究以区分生成的图像从真实图像中的图像,但仍然存在挑战以区分训练设置之外的未经证实的图像。由于模型过度的问题引起了由特定GAN生成的培训数据而产生的数据依赖性,发生了这种限制。为了克服这个问题,我们采用自我监督计划提出一个新颖的框架。我们所提出的方法由人工指纹发生器重构GaN图像的高质量人工指纹进行详细分析,并且通过学习重建的人造指纹来区分GaN图像。为了提高人工指纹发生器的泛化,我们构建具有不同数量的上耦层的多个自动泊。利用许多消融研究,即使不利用训练数据集的GaN图像,也通过表现出先前最先进的算法的概括来验证我们的方法的鲁棒广泛化。
translated by 谷歌翻译
我们解决了人搜索的任务,即从一组原始场景图像中进行本地化和重新识别查询人员。最近的方法通常是基于Oimnet(在人搜索上的先驱工作)建立的,该作品学习了执行检测和人重新识别(REID)任务的联合人物代表。为了获得表示形式,它们从行人提案中提取特征,然后将其投射到具有L2归一化的单位超晶体上。这些方法还结合了所有积极的建议,这些建议与地面真理充分重叠,同样可以学习REID的人代表。我们发现1)L2归一化而不考虑特征分布会退化人的判别能力,而2)正面建议通常也描绘了背景混乱和人的重叠,这可能会将嘈杂的特征编码为人的表示。在本文中,我们介绍了解决上述局限性的Oimnet ++。为此,我们引入了一个新颖的归一化层,称为Protonorm,该层校准了行人建议的特征,同时考虑了人ID的长尾分布,使L2归一化的人表示具有歧视性。我们还提出了一种本地化感知的特征学习计划,该方案鼓励更好地调整的建议在学习歧视性表示方面做出更多的贡献。对标准人员搜索基准的实验结果和分析证明了Oimnet ++的有效性。
translated by 谷歌翻译
The 3D-aware image synthesis focuses on conserving spatial consistency besides generating high-resolution images with fine details. Recently, Neural Radiance Field (NeRF) has been introduced for synthesizing novel views with low computational cost and superior performance. While several works investigate a generative NeRF and show remarkable achievement, they cannot handle conditional and continuous feature manipulation in the generation procedure. In this work, we introduce a novel model, called Class-Continuous Conditional Generative NeRF ($\text{C}^{3}$G-NeRF), which can synthesize conditionally manipulated photorealistic 3D-consistent images by projecting conditional features to the generator and the discriminator. The proposed $\text{C}^{3}$G-NeRF is evaluated with three image datasets, AFHQ, CelebA, and Cars. As a result, our model shows strong 3D-consistency with fine details and smooth interpolation in conditional feature manipulation. For instance, $\text{C}^{3}$G-NeRF exhibits a Fr\'echet Inception Distance (FID) of 7.64 in 3D-aware face image synthesis with a $\text{128}^{2}$ resolution. Additionally, we provide FIDs of generated 3D-aware images of each class of the datasets as it is possible to synthesize class-conditional images with $\text{C}^{3}$G-NeRF.
translated by 谷歌翻译
In both terrestrial and marine ecology, physical tagging is a frequently used method to study population dynamics and behavior. However, such tagging techniques are increasingly being replaced by individual re-identification using image analysis. This paper introduces a contrastive learning-based model for identifying individuals. The model uses the first parts of the Inception v3 network, supported by a projection head, and we use contrastive learning to find similar or dissimilar image pairs from a collection of uniform photographs. We apply this technique for corkwing wrasse, Symphodus melops, an ecologically and commercially important fish species. Photos are taken during repeated catches of the same individuals from a wild population, where the intervals between individual sightings might range from a few days to several years. Our model achieves a one-shot accuracy of 0.35, a 5-shot accuracy of 0.56, and a 100-shot accuracy of 0.88, on our dataset.
translated by 谷歌翻译
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
translated by 谷歌翻译
The purpose of this work was to tackle practical issues which arise when using a tendon-driven robotic manipulator with a long, passive, flexible proximal section in medical applications. A separable robot which overcomes difficulties in actuation and sterilization is introduced, in which the body containing the electronics is reusable and the remainder is disposable. A control input which resolves the redundancy in the kinematics and a physical interpretation of this redundancy are provided. The effect of a static change in the proximal section angle on bending angle error was explored under four testing conditions for a sinusoidal input. Bending angle error increased for increasing proximal section angle for all testing conditions with an average error reduction of 41.48% for retension, 4.28% for hysteresis, and 52.35% for re-tension + hysteresis compensation relative to the baseline case. Two major sources of error in tracking the bending angle were identified: time delay from hysteresis and DC offset from the proximal section angle. Examination of these error sources revealed that the simple hysteresis compensation was most effective for removing time delay and re-tension compensation for removing DC offset, which was the primary source of increasing error. The re-tension compensation was also tested for dynamic changes in the proximal section and reduced error in the final configuration of the tip by 89.14% relative to the baseline case.
translated by 谷歌翻译
According to the rapid development of drone technologies, drones are widely used in many applications including military domains. In this paper, a novel situation-aware DRL- based autonomous nonlinear drone mobility control algorithm in cyber-physical loitering munition applications. On the battlefield, the design of DRL-based autonomous control algorithm is not straightforward because real-world data gathering is generally not available. Therefore, the approach in this paper is that cyber-physical virtual environment is constructed with Unity environment. Based on the virtual cyber-physical battlefield scenarios, a DRL-based automated nonlinear drone mobility control algorithm can be designed, evaluated, and visualized. Moreover, many obstacles exist which is harmful for linear trajectory control in real-world battlefield scenarios. Thus, our proposed autonomous nonlinear drone mobility control algorithm utilizes situation-aware components those are implemented with a Raycast function in Unity virtual scenarios. Based on the gathered situation-aware information, the drone can autonomously and nonlinearly adjust its trajectory during flight. Therefore, this approach is obviously beneficial for avoiding obstacles in obstacle-deployed battlefields. Our visualization-based performance evaluation shows that the proposed algorithm is superior from the other linear mobility control algorithms.
translated by 谷歌翻译
In robotics and computer vision communities, extensive studies have been widely conducted regarding surveillance tasks, including human detection, tracking, and motion recognition with a camera. Additionally, deep learning algorithms are widely utilized in the aforementioned tasks as in other computer vision tasks. Existing public datasets are insufficient to develop learning-based methods that handle various surveillance for outdoor and extreme situations such as harsh weather and low illuminance conditions. Therefore, we introduce a new large-scale outdoor surveillance dataset named eXtremely large-scale Multi-modAl Sensor dataset (X-MAS) containing more than 500,000 image pairs and the first-person view data annotated by well-trained annotators. Moreover, a single pair contains multi-modal data (e.g. an IR image, an RGB image, a thermal image, a depth image, and a LiDAR scan). This is the first large-scale first-person view outdoor multi-modal dataset focusing on surveillance tasks to the best of our knowledge. We present an overview of the proposed dataset with statistics and present methods of exploiting our dataset with deep learning-based algorithms. The latest information on the dataset and our study are available at https://github.com/lge-robot-navi, and the dataset will be available for download through a server.
translated by 谷歌翻译